NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

STK19: a critical factor coordinating transcription-coupled DNA repair

https://doi.org/10.1093/procel/pwaf007

Li, Shisheng; Li, Wentao (January 2025, Protein & Cell)
Federated generalized linear mixed models for collaborative genome-wide association studies

https://doi.org/10.1016/j.isci.2023.107227

Li, Wentao; Chen, Han; Jiang, Xiaoqian; Harmanci, Arif (August 2023, iScience)

Full Text Available
Privacy-preserving federated genome-wide association studies via dynamic sampling

https://doi.org/10.1093/bioinformatics/btad639

Wang, Xinyue; Dervishi, Leonard; Li, Wentao; Ayday, Erman; Jiang, Xiaoqian; Vaidya, Jaideep (October 2023, Bioinformatics)
Nikolski, Macha (Ed.)
Abstract MotivationGenome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS. ResultsThis work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS. Availability and implementationThe source code and data are available at https://github.com/amioamo/TDS.
more » « less
Full Text Available
Data-driven retrieval of population-level EEG features and their role in neurodegenerative diseases

https://doi.org/10.1093/braincomms/fcae227

Li, Wentao; Varatharajah, Yogatheesan; Dicks, Ellen; Barnard, Leland; Brinkmann, Benjamin H; Crepeau, Daniel; Worrell, Gregory; Fan, Winnie; Kremers, Walter; Boeve, Bradley; et al (January 2024, Brain Communications)

Abstract Electrophysiologic disturbances due to neurodegenerative disorders such as Alzheimer’s disease and Lewy Body disease are detectable by scalp EEG and can serve as a functional measure of disease severity. Traditional quantitative methods of EEG analysis often require an a-priori selection of clinically meaningful EEG features and are susceptible to bias, limiting the clinical utility of routine EEGs in the diagnosis and management of neurodegenerative disorders. We present a data-driven tensor decomposition approach to extract the top 6 spectral and spatial features representing commonly known sources of EEG activity during eyes-closed wakefulness. As part of their neurologic evaluation at Mayo Clinic, 11 001 patients underwent 12 176 routine, standard 10–20 scalp EEG studies. From these raw EEGs, we developed an algorithm based on posterior alpha activity and eye movement to automatically select awake-eyes-closed epochs and estimated average spectral power density (SPD) between 1 and 45 Hz for each channel. We then created a three-dimensional (3D) tensor (record × channel × frequency) and applied a canonical polyadic decomposition to extract the top six factors. We further identified an independent cohort of patients meeting consensus criteria for mild cognitive impairment (30) or dementia (39) due to Alzheimer’s disease and dementia with Lewy Bodies (31) and similarly aged cognitively normal controls (36). We evaluated the ability of the six factors in differentiating these subgroups using a Naïve Bayes classification approach and assessed for linear associations between factor loadings and Kokmen short test of mental status scores, fluorodeoxyglucose (FDG) PET uptake ratios and CSF Alzheimer’s Disease biomarker measures. Factors represented biologically meaningful brain activities including posterior alpha rhythm, anterior delta/theta rhythms and centroparietal beta, which correlated with patient age and EEG dysrhythmia grade. These factors were also able to distinguish patients from controls with a moderate to high degree of accuracy (Area Under the Curve (AUC) 0.59–0.91) and Alzheimer’s disease dementia from dementia with Lewy Bodies (AUC 0.61). Furthermore, relevant EEG features correlated with cognitive test performance, PET metabolism and CSF AB42 measures in the Alzheimer’s subgroup. This study demonstrates that data-driven approaches can extract biologically meaningful features from population-level clinical EEGs without artefact rejection or a-priori selection of channels or frequency bands. With continued development, such data-driven methods may improve the clinical utility of EEG in memory care by assisting in early identification of mild cognitive impairment and differentiating between different neurodegenerative causes of cognitive impairment.
more » « less
Full Text Available
Approximate Confidence Distribution Computing

https://doi.org/10.51387/23-NEJSDS38

Thornton, Suzanne; Li, Wentao; Xie, Minge (January 2023, The New England Journal of Statistics in Data Science)

Approximate confidence distribution computing (ACDC) offers a new take on the rapidly developing field of likelihood-free inference from within a frequentist framework. The appeal of this computational method for statistical inference hinges upon the concept of a confidence distribution, a special type of estimator which is defined with respect to the repeated sampling principle. An ACDC method provides frequentist validation for computational inference in problems with unknown or intractable likelihoods. The main theoretical contribution of this work is the identification of a matching condition necessary for frequentist validity of inference from this method. In addition to providing an example of how a modern understanding of confidence distribution theory can be used to connect Bayesian and frequentist inferential paradigms, we present a case to expand the current scope of so-called approximate Bayesian inference to include non-Bayesian inference by targeting a confidence distribution rather than a posterior. The main practical contribution of this work is the development of a data-driven approach to drive ACDC in both Bayesian or frequentist contexts. The ACDC algorithm is data-driven by the selection of a data-dependent proposal function, the structure of which is quite general and adaptable to many settings. We explore three numerical examples that both verify the theoretical arguments in the development of ACDC and suggest instances in which ACDC outperform approximate Bayesian computing methods computationally.
more » « less
Full Text Available
Relating Underlying Performance Objectives of Overground Walking to Observable Walking Mechanics using Predictive Musculoskeletal Simulations

https://doi.org/10.1109/ICORR55369.2022.9896553

Li, Wentao; Fey, Nicholas P. (July 2022, IEEE)

Full Text Available
Facilitating Federated Genomic Data Analysis by Identifying Record Correlations while Ensuring Privacy

Dervishi, Leonard; Wang, Xinyue; Li, Wentao; Halimi, Anisa; Vaidya, Jaideep; Jiang, Xiaoqian; Ayday, Erman (March 2022, AMIA Annual Symposium proceedings)

Full Text Available
Rapid, high-sensitivity analysis of oxyhalides by non-suppressed ion chromatography-electrospray ionization-mass spectrometry: application to ClO ₄ ⁻ , ClO ₃ ⁻ , ClO ₂ ⁻ , and BrO ₃ ⁻ quantification during sunlight/chlorine advanced oxidation

https://doi.org/10.1039/d0ew00429d

Young, Tessora R.; Cheng, Shi; Li, Wentao; Dodd, Michael C. (August 2020, Environmental Science: Water Research & Technology)
null (Ed.)
A rapid and sensitive method is described for measuring perchlorate (ClO 4 − ), chlorate (ClO 3 − ), chlorite (ClO 2 − ), bromate (BrO 3 − ), and iodate (IO 3 − ) ions in natural and treated waters using non-suppressed ion chromatography with electrospray ionization and tandem mass spectrometry (NS-IC-MS/MS). Major benefits of the NS-IC-MS/MS method include a short analysis time (12 minutes), low limits of quantification for BrO 3 − (0.10 μg L −1 ), ClO 4 − (0.06 μg L −1 ), ClO 3 − (0.80 μg L −1 ), and ClO 2 − (0.40 μg L −1 ), and compatibility with conventional LC-MS/MS instrumentation. Chromatographic separations were generally performed under isocratic conditions with a Thermo Scientific Dionex AS16 column, using a mobile phase of 20% 1 M aqueous methylamine and 80% acetonitrile. The isocratic method can also be optimized for IO 3 − analysis by including a gradient from the isocratic mobile phase to 100% 1 M aqueous methylamine. Four common anions (Cl − , Br − , SO 4 2− , and HCO 3 − /CO 3 2− ), a natural organic matter isolate (Suwannee River NOM), and several real water samples were tested to examine influences of natural water constituents on oxyhalide detection. Only ClO 2 − quantification was significantly affected – by elevated chloride concentrations (>2 mM) and NOM. The method was successfully applied to quantify oxyhalides in natural waters, chlorinated tap water, and waters subjected to advanced oxidation by sunlight-driven photolysis of free available chlorine (sunlight/FAC). Sunlight/FAC treatment of NOM-free waters containing 200 μg L −1 Br − resulted in formation of up to 263 ± 35 μg L −1 and 764 ± 54 μg L −1 ClO 3 − , and up to 20.1 ± 1.0 μg L −1 and 33.8 ± 1.0 μg L −1 BrO 3 − (at pH 6 and 8, respectively). NOM strongly inhibited ClO 3 − and BrO 3 − formation, likely by scavenging reactive oxygen or halogen species. As prior work shows that the greatest benefits in applying the sunlight/FAC process for purposes of improving disinfection of chlorine-resistant microorganisms are realized in waters with lower DOC levels and higher pH, it may therefore be desirable to limit potential applications to waters containing moderate DOC concentrations ( e.g. , ∼1–2 mg C L −1 ), low Br − concentrations ( e.g. , <50 μg L −1 ), and circumneutral to moderately alkaline pH ( e.g. , pH 7–8) to strike a balance between maximizing microbial inactivation while minimizing formation of oxyhalides and other disinfection byproducts.
more » « less
Full Text Available
Characterization of disinfection byproduct formation and associated changes to dissolved organic matter during solar photolysis of free available chlorine

https://doi.org/10.1016/j.watres.2018.09.022

Young, Tessora R.; Li, Wentao; Guo, Alan; Korshin, Gregory V.; Dodd, Michael C. (December 2018, Water Research)
null (Ed.)
Full Text Available
Facilitators and Repressors of Transcription‐coupled DNA Repair in Saccharomyces cerevisiae

https://doi.org/10.1111/php.12655

Li, Wentao; Li, Shisheng (November 2016, Photochemistry and Photobiology)

Search for: All records